Applying Naive Bayes Classification to Google Play Apps Categorization

نویسنده

  • Olabenjo Babatunde
چکیده

There are over one million apps on Google Play Store and over half a million publishers. Having such a huge number of apps and developers can pose a challenge to app users and new publishers on the store. Discovering apps can be challenging if apps are not correctly published in the right category, and, in turn, reduce earnings for app developers. Additionally, with over 41 categories on Google Play Store, deciding on the right category to publish an app can be challenging for developers due to the number of categories they have to choose from. Machine Learning has been very useful, especially in classification problems such sentiment analysis, document classification and spam detection. These strategies can also be applied to app categorization on Google Play Store to suggest appropriate categories for app publishers using details from their application. In this project, we built two variations of the Naı̈ve Bayes classifier using open metadata from top developer apps on Google Play Store in other to classify new apps on the store. These classifiers are then evaluated using various evaluation methods and their results compared against each other. The results show that the Naı̈ve Bayes algorithm performs well for our classification problem and can potentially automate app categorization for Android app publishers on Google Play Store.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Approach for Text Documents Classification with Invasive Weed Optimization and Naive Bayes Classifier

With the fast increase of the documents, using Text Document Classification (TDC) methods has become a crucial matter. This paper presented a hybrid model of Invasive Weed Optimization (IWO) and Naive Bayes (NB) classifier (IWO-NB) for Feature Selection (FS) in order to reduce the big size of features space in TDC. TDC includes different actions such as text processing, feature extraction, form...

متن کامل

Improving the performance of Naive Bayes multinomial in e-mail foldering by introducing distribution-based balance of datasets

E-mail foldering or e-mail classification into user predefined folders can be viewed as a text classification/categorization problem. However, it has some intrinsic properties that make it more difficult to deal with, mainly the large cardinality of the class variable (i.e. the number of folders), the different number of e-mails per class state and the fact that this is a dynamic problem, in th...

متن کامل

In silico prediction of anticancer peptides by TRAINER tool

Cancer is one of the causes of death in the world. Several treatment methods exist against cancer cells such as radiotherapy and chemotherapy. Since traditional methods have side effects on normal cells and are expensive, identification and developing a new method to cancer therapy is very important. Antimicrobial peptides, present in a wide variety of organisms, such as plants, amphibians and ...

متن کامل

Integrating Multiple Internet Directories by Instance-based Learning

Finding desired information on the Internet is becoming increasingly difficult. Internet directories such as Yahoo!, which organize web pages into hierarchical categories, provide one solution to this problem; however, such directories are of limited use because some bias is applied both in the collection and categorization of pages. We propose a method for integrating multiple Internet directo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1608.08574  شماره 

صفحات  -

تاریخ انتشار 2016